NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sparse Vector-Matrix Multiplication Acceleration in Diode-Selected Crossbars

https://doi.org/10.1109/TVLSI.2021.3114186

Jao, Nicholas; Ramanathan, Akshay Krishna; Sampson, John; Narayanan, Vijaykrishnan (December 2021, IEEE Transactions on Very Large Scale Integration (VLSI) Systems)

Full Text Available
MaxTracker: Continuously Tracking the Maximum Computation Progress for Energy Harvesting ReRAM-based CNN Accelerators

https://doi.org/10.1145/3477009

Qiu, Keni; Jao, Nicholas; Zhou, Kunyu; Liu, Yongpan; Sampson, Jack; Kandemir, Mahmut Taylan; Narayanan, Vijaykrishnan (October 2021, ACM Transactions on Embedded Computing Systems)
null (Ed.)
There is an ongoing trend to increasingly offload inference tasks, such as CNNs, to edge devices in many IoT scenarios. As energy harvesting is an attractive IoT power source, recent ReRAM-based CNN accelerators have been designed for operation on harvested energy. When addressing the instability problems of harvested energy, prior optimization techniques often assume that the load is fixed, overlooking the close interactions among input power, computational load, and circuit efficiency, or adapt the dynamic load to match the just-in-time incoming power under a simple harvesting architecture with no intermediate energy storage. Targeting a more efficient harvesting architecture equipped with both energy storage and energy delivery modules, this paper is the first effort to target whole system, end-to-end efficiency for an energy harvesting ReRAM-based accelerator. First, we model the relationships among ReRAM load power, DC-DC converter efficiency, and power failure overhead. Then, a maximum computation progress tracking scheme ( MaxTracker ) is proposed to achieve a joint optimization of the whole system by tuning the load power of the ReRAM-based accelerator. Specifically, MaxTracker accommodates both continuous and intermittent computing schemes and provides dynamic ReRAM load according to harvesting scenarios. We evaluate MaxTracker over four input power scenarios, and the experimental results show average speedups of 38.4%/40.3% (up to 51.3%/84.4%), over a full activation scheme (with energy storage) and order-of-magnitude speedups over the recently proposed (energy storage-less) ResiRCA technique. Furthermore, we also explore MaxTracker in combination with the Capybara reconfigurable capacitor approach to offer more flexible tuners and thus further boost the system performance.
more » « less
Full Text Available
NEBULA: A Neuromorphic Spin-Based Ultra-Low Power Architecture for SNNs and ANNs

https://doi.org/10.1109/ISCA45697.2020.00039

Singh, Sonali; Sarma, Anup; Jao, Nicholas; Pattnaik, Ashutosh; Lu, Sen; Yang, Kezhou; Sengupta, Abhronil; Narayanan, Vijaykrishnan; Das, Chita R. (May 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA))
null (Ed.)
Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other uses spiking neural networks (SNNs) to emulate biological neurons in an attempt to be as efficient and fault-tolerant as the brain. While there has been considerable progress in the former area due to a combination of effective training algorithms and acceleration platforms, the latter is still in its infancy due to the lack of both. SNNs have a distinct advantage over their ANN counterparts in that they are capable of operating in an event-driven manner, thus consuming very low power. Several recent efforts have proposed various SNN hardware design alternatives, however, these designs still incur considerable energy overheads.In this context, this paper proposes a comprehensive design spanning across the device, circuit, architecture and algorithm levels to build an ultra low-power architecture for SNN and ANN inference. For this, we use spintronics-based magnetic tunnel junction (MTJ) devices that have been shown to function as both neuro-synaptic crossbars as well as thresholding neurons and can operate at ultra low voltage and current levels. Using this MTJ-based neuron model and synaptic connections, we design a low power chip that has the flexibility to be deployed for inference of SNNs, ANNs as well as a combination of SNN-ANN hybrid networks - a distinct advantage compared to prior works. We demonstrate the competitive performance and energy efficiency of the SNNs as well as hybrid models on a suite of workloads. Our evaluations show that the proposed design, NEBULA, is up to 7.9× more energy efficient than a state-of-the-art design, ISAAC, in the ANN mode. In the SNN mode, our design is about 45× more energy-efficient than a contemporary SNN architecture, INXS. Power comparison between NEBULA ANN and SNN modes indicates that the latter is at least 6.25× more power-efficient for the observed benchmarks.
more » « less
Full Text Available
ResiRCA: A Resilient Energy Harvesting ReRAM Crossbar-Based Accelerator for Intelligent Embedded Processors

https://doi.org/10.1109/HPCA47549.2020.00034

Qiu, Keni; Jao, Nicholas; Zhao, Mengying; Mishra, Cyan Subhra; Gudukbay, Gulsum; Jose, Sethu; Sampson, Jack; Kandemir, Mahmut Taylan; Narayanan, Vijaykrishnan (February 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA))

Many recent works have shown substantial efficiency boosts from performing inference tasks on Internet of Things (IoT) nodes rather than merely transmitting raw sensor data. However, such tasks, e.g., convolutional neural networks (CNNs), are very compute intensive. They are therefore challenging to complete at sensing-matched latencies in ultra-low-power and energy-harvesting IoT nodes. ReRAM crossbar-based accelerators (RCAs) are an ideal candidate to perform the dominant multiplication-and-accumulation (MAC) operations in CNNs efficiently, but conventional, performance-oriented RCAs, while energy-efficient, are power hungry and ill-optimized for the intermittent and unstable power supply of energy-harvesting IoT nodes. This paper presents the ResiRCA architecture that integrates a new, lightweight, and configurable RCA suitable for energy harvesting environments as an opportunistically executing augmentation to a baseline sense-and-transmit battery-powered IoT node. To maximize ResiRCA throughput under different power levels, we develop the ResiSchedule approach for dynamic RCA reconfiguration. The proposed approach uses loop tiling-based computation decomposition, model duplication within the RCA, and inter-layer pipelining to reduce RCA activation thresholds and more closely track execution costs with dynamic power income. Experimental results show that ResiRCA together with ResiSchedule achieve average speedups and energy efficiency improvements of 8× and 14× respectively compared to a baseline RCA with intermittency-unaware scheduling.
more » « less
Full Text Available

Search for: All records